\subsubsection{Time}
\label{sub-sec:exp-time}


% Apresentar sumário geral dos resultados de tempo
In order to analyze the time spent on tests we gathered data from the
preparation and execution of the tested use cases. 
Table~\ref{tbl:result-time} presents a summary of the time spent by each
technique.
 
\begin{figure*}[!t]
\centering
\includegraphics[width=\textwidth]{figs/tempo_2.png}
\caption{a) Box plot comparison of techniques measured in minutes
b) Comparison of spent time per use case measured in minutes}
\label{fig:time}
\end{figure*}

% Apresentar números absolutos
The whole experiment spent 55 hours testing all the 27 UCs. 25 hours
were spent on the manual technique and 30 hours on the TaRGeT. There was a small
difference of 5 hours between the {\bf total time} spent on TaRGeT and ad hoc
tests, being 1 hour on tests preparation and 4 hours on tests execution.



% Apresentar valores estatísticos
Comparing both techniques, on average there is a difference of only 13
minutes {\it per use case}.
Considering all tested use cases, the {\bf average time} spent 
per use case on manual tests was 55 minutes against 68 minutes with TaRGeT. 


% Figura
The general comparison overview can be visualized in the box plot chart
presented in Figure~\ref{fig:time}-A.
% Outliers
The noticed outliers are from the time spent on
the most complex use cases of the tested release. Since these outliers are
important to our analysis, we did not remove them from our data and we discuss
them lately.


\begin{table}[!t]
\caption{Time spent in tests}
\label{tbl:result-time}
\centering
\begin{tabular}{c|c|c|c}
\hline
\bfseries & \bfseries Preparation & \bfseries Execution & \bfseries Total\\
\hline\hline
\bfseries Ad hoc & 9 hours & 16 hours & 25 hours\\
\hline
\bfseries TaRGeT & 10 hours & 20 hours & 30 hours\\
\hline
\hline
\bfseries Total & 19 hours & 36 hours & 55 hours\\
\hline
\end{tabular}
\end{table}



% These outliers on manual tests against only one with TaRGeT, . 

% Apresentar cenários específicos
After the initial box plot analysis, we went through a more detailed 
comparison of the time spent on each use case for each approach.
Figure~\ref{fig:time}-B presents the bar chart comparison for the total time
metric ({\it $T_T$}). Even though TaRGeT presents a longer test time
duration, some manual test took longer time ({\it i.e.} the outliers 
UC01 and UC04).
This contributed for the equally mean time for both techniques.


Despite the equal average time, 
TaRGeT had a greater time on 60\% of the use cases against 40\% from manual
tests.
For the most complex use cases (UC01 to UC15), this difference
is even greater: TaRGeT tests had a 76\% larger duration against only 24\% from
manual tests. However, on simple use cases, this difference is very slightly:
Ad hoc tests had a greater time on 42\% of the use cases and TaRGeT on 35\% of
them (in the remaining 23\%, they were almost equal).  


% Apresentar teste de Wilcoxon para tempo total
Since the mean time for both techniques were very similar, but the pairwise
and proportion comparison lead to different conclusions, we applied a
Mann-Whitney Wilcoxon's test to statistically verify if any of the approaches
had a better performance. 
% Resultado
Hence, we tested our time hypothesis {\it $H_{\emptyset1}$} and with a {\it p-value} of
approximately 0.08, we could not reject it at a significance level of 5\%.
Thus, it is possible that the total time ({\it $T_T$}) spent by each technique
is statistically equivalent, however we do not have statistical evidence to
confirm it. These tests were applied for the general use case comparison and
also for the grouped use cases (simple and complex ones).



% Parecer final
Even though TaRGeT tests seem to spend more time, the slight 
difference on the average test duration indicates that the overall time
performance for both techniques is equivalent.
